$256$ discrete values between $0$ and $255$, every value is represented.
For the metrics the ouput $y \in [-1,1]$ was trasformed to $\hat{y} \in [0,1]$ and then rounded to two decimal places, since we want to better observe the dieffernce betwen runs instead of the absolute value.
For the classifier the data was not converted.
$\epsilon$-accuracy
diff <- target_set - predicted_set // pixelwise difference stored as (10000, 784)
accu <- 0 // accumulator
loop elem in diff // for each element in diff i.e. for each number, image (784,)
accu <- |{i ∈ elem : elem < ε }| / 784 //count how many elements are > ε and average over pixels, i.e. divide by 784
accu <- accu/10000 // Average over examples in image set
$\epsilon$-outliers
diff <- target_set - predicted_set // pixelwise difference stored as (10000, 784)
accu <- 0 // accumulator
loop elem in diff // for each element in diff i.e. for each number, image (784,)
accu <- |{i ∈ elem : elem > ε }| / 784 //count how many elements are > ε and average over pixels, i.e. divide by 784
accu <- accu/10000 // Average over examples in image set
$\operatorname{mse} =\frac{1}{m} \sum_{j=1}^{m} \frac{1}{n} (Y_j-\hat{Y_j})^2 = \frac{1}{m} \sum_{j=1}^{m} \frac{1}{n} \sum_{i=1}^{n} (y_i-\hat{y_i})^2$
where:
$\operatorname{mae} =\frac{1}{m} \sum_{j=1}^{m} \frac{1}{n} |Y_j-\hat{Y_j}| = \frac{1}{m} \sum_{j=1}^{m} \frac{1}{n} \sum_{i=1}^{n} |y_i-\hat{y_i}|$
where: